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Abstract 

The Bayesian approach has become increasingly popular because it allows to model 
quite complex models via Markov chain Monte Carlo (MCMC) sampling. However, it is 
also recognized nowadays that MCMC sampling can become computationally prohibitive 
when a complex model needs to be fit to a large data set. To overcome this problem, 
we applied and extended a recently proposed two-stage approach to model a complex 
hierarchical data structure of glaucoma patients who participate in an ongoing Dutch 
study. Glaucoma is one of the leading causes of blindness in the world. In order to detect 
deterioration at an early stage, a model for predicting visual fields (VF) in time is needed. 
Hence, the true underlying VF progression can be determined, and treatment strategies 
can then be optimized to prevent further VF loss. Since we were unable to fit these data 
with the classical one-stage approach upon which the current popular Bayesian software 
is based, we made use of the two-stage Bayesian approach. The considered hierarchical 
longitudinal model involves estimating a large number of random effects and deals with 
censoring and high measurement variability. In addition, we extended the approach with 
tools for model evaluation 

KEY WORDS: Bayesian modeling, Hierarchical structure, Longitudinal data analysis, 
Two-stage approach. 
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1 Introduction 


Since the introduction of MCMC sampling by Gelfand and Smith [8] and the development 
of the BUGS software [19] the Bayesian approach has become tremendously popular in 
various application areas, but especially to fit models to complex data structures. But 
with the years it also became clear that MCMC sampling can be computationally quite 
cumbersome, and even prohibitive, for fitting complex models to relatively large data sets. 
Several attempts have been made to look for alternative computational procedures and 
software, with notable examples such as INLA [24] and STAN [11], While this newly 
software can sometimes speed up the computations considerably, the computational gain 
is not always obvious upfront and for some advanced models the new developments may 
not be suitable yet. In addition, the majority of the practical Bayesians still use BUGS- 
related software. In this context, Lunn et al. [18] proposed to fit a hierarchical model in 
two stages. The authors claim more model flexibility in this way, but advocate the use 
of their procedure especially for its computational properties. In this paper we further 
illustrate the use of the two-stage approach on a far more complex hierarchical data 
structure of glaucoma patients. In addition, we extended the approach with an additional 
sampling step to allow for the calculation of model selection and model evaluation criteria. 

Our modeling approach is motivated by data from the Glaucoma Study conducted 
by the Rotterdam Eye Hospital in the Netherlands. According to the World Health 
Organization (WHO), glaucoma is one of the leading causes of irreversible blindness in 
the world [14]. Adequate treatment may slow down the disease, possibly even halting 
its progression. Evaluation of a longitudinal series of visual fields (VF), as measured by 
standard automated perimetry (SAP), provides a way to detect early evidence of glaucoma 
and to determine functional deterioration. However, due to the subjective nature of this 
technique, SAP is prone to large variability. In order to measure the true progression of 
the disease, this variability needs to be taken into account. The Glaucoma Study provides 
a unique database with a long follow up time. Although methods may have existed to 
model this type of data, the difficulties in extracting it from the device has made this type 
of data rare and hence has prevented much research on the topic. 

The response variable of interest are the sensitivity estimates which describe the level 
of differential light sensitivity at different locations within each eye. The sensitivity esti¬ 
mates are left-censored due to the limitation of the device. Models which take into account 
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this type of censoring, such as the Tobit model, have been described in the literature [28] . 
Our interest lies in modeling the latent, true values rather than the observed sensitivity 
estimates for two reasons. Firstly, clinical interest lies in predicting the disease progression 
rather than the observed sensitivity estimates. Secondly, using the latent scale allows us 
to use a simpler model than when directly modeling the observed data. The hierarchical 
structure of the data consists of 4 levels, namely, (1) the individual, (2) the eye (3) the 
hemifield and (4) the location. There is a vast amount of literature that addresses hier¬ 
archical mixed effects models, for both frequentist [29] and Bayesian [17, 21] approaches. 
We model this complex data structure using a Bayesian hierarchical mixed effects model 
with cross-classified random effects. Hence, we combine both spatial and time effects. 
One of the difficulties in modeling VF data is the amount and type of measurement error 
or variability in the sensitivity estimates. This may be due to measurable factors, such as 
season, time of day and reliability indices, or unknown transient factors, such as fatigue, 
lack of concentration, or delayed reaction time. Although their magnitudes may vary, 
these factors affect all locations belonging to the same VF. We propose to model them 
as Global Visit Effects (GVEs). Furthermore, there is an inverse relationship between 
sensitivity and variability. For example, measurement error in the VFs increases with 
damage, and hence low sensitivity estimates have high variability. Therefore, it is naive 
to assume a constant variance over the wide range of sensitivity estimates. In this paper, 
we relax this assumption in order to incorporate this relationship. A problem with high 
dimensional data and complex data structures, is that it is sometimes difficult or even 
impossible to model them with standard MCMC algorithms. Lunn et al. [18] proposed a 
two-stage approach, which allowed us to simplify the problem while still benefiting from 
the advantages of a full Bayesian model. However, one of the disadvantages of this ap¬ 
proach, is that it is not possible to directly obtain the random effects estimates needed 
for most model evaluations. We address this issue by extending the two-stage approach 
to be able to determine these estimates. 

Our aim is to model this complex data structure in order to obtain better estimates 
of the true evolution of the sensitivity over time, so that treatment strategies can be 
optimized to prevent further progression of VF loss. The structure of the paper is as 
follows. In Section 2 we give further details on the motivating data set and introduce the 
research questions that triggered our modeling approach(es). In Section 3 we describe the 
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models used in the analysis. In the subsequent section we briefly review computational 
aspects of the analysis. Model comparison is dealt with in Section 5. In Section 6 we 
apply our models to the Glaucoma Study data. Section 7 contains a concluding discussion. 
Further details regarding the modeling approach are provided in an appendix. 

2 Motivating data set: the Glaucoma Study 

2.1 Description of the project 

The Glaucoma Study is a prospective cohort study conducted by the Rotterdam Eye 
Hospital in the Netherlands. This is an ongoing study which began in 1998. Inclusion 
criteria included glaucoma diagnosis and an age range of 18 to 85 years. In total, 139 
patients, consisting of 80 (57.6%) men and 59 (42.4%) women, were recruited with a 
mean follow-up of 10.5 years. Follow-up data were collected at approximately 6-monthly 
intervals. All patients gave their written informed consent for participation. All research 
procedures followed the tenets set forth in the Declaration of Helsinki. Furthermore, all 
of the data that was used in this analysis has been made available online at http://rod- 
rep.com. 

Sensitivity estimates were measured at 52 test locations within each eye, or 26 test 
locations within each hemifield (excluding two locations corresponding to the blind spot) 
as shown in Figure 1. The VFs were tested using the Humphrey Field Analyzer with the 
24-2, white-on-white test strategy using the Full Threshold algorithm. The light source 
can be attenuated in the range from 1 to 10,000 times. On the decibel (dB) scale an 
attenuation x is defined as s = 10 log 10 (x), or x = 10 s / 10 . The lowest sensitivity that 
can be detected by this perimeter is 0 dB, although negative values could in fact occur if 
it were not for the limitations of this device. The highest sensitivity that can be detected 
is 50 dB, however few humans are capable of seeing a stimulus less than 40 dB, which 
is 1/10,000 of the maximum intensity of the instrument (or 1 asb). Thus, for practical 
purposes, the useful intensity range for white light testing is from 0 to 40 dB with a 
background illumination of 31.5 asb. [1]. 
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Figure 1: Fundus photo of a left human eye with the 54 test locations for the VF test represented 
by white dots. 

2.2 Previous research 

Parameters such as the mean deviation (MD) and visual field index (VFI) summarize 
the 52 sensitivity estimates into single values which can be used by the clinicians when 
optimizing treatment strategies. Longitudinal modeling of these VF summary parameters 
has been done before [2, 4, 13, 15]. Modeling of individual test locations is potentially of 
greater interest, because it provides additional information such as the spatial nature of 
the fields which is otherwise lost in global parameters. In previous research, each location 
was analyzed as an independent sample [5, 6, 20]. However, separate location-specific 
regression models are not able to use any information from the data set as a whole. 
Multilevel mixed-effects models provide a better fit to the data than separate regression 
models by accounting for group effects and/or within-group correlation [29]. This was 
shown in the context of global VF measurements by Pathak et al. [22], 

In glaucoma, variability is presumably related to fatigue effects and response errors, 
whereby sensitivity estimates decrease over time [3, 12]. Differences in fatigue effects, 
between the inferior and superior hemihelds within an eye have been demonstrated [12], 
Furthermore, this effect may differ between the first and second eye at the same visit. The 
number of false-negative answers have been shown to be higher in eyes with held loss. This 
may be explained by an increased variability in sensitivity estimates typically found in such 
eyes [3, 25]. A common approach to reduce measurement variability is to average multiple 


5 










measurements. For example, random uncorrelated measurement errors that are present 
in the point-wise sensitivity estimates are reduced when calculating summary parameters 
such as the mean deviation (MD). Other errors, however, are spatially correlated and 
affect the whole VF. One group of such errors are measurable factors, including season, 
time of day and reliability indices, which have been evaluated before [13]. Although these 
factors are statistically significant, they are rather small and hence only explain a small 
part of the observed global variation in VFs. 

The inverse relationship between variability and sensitivity has been described in the 
literature. Henson et al. (2000) [10] found that this relationship is well represented by the 
function log (SD) = A + B x sensitivity(dH), where A and B are 2.81 dB and -0.066 dB 
respectively for normal eyes and 3.62 dB and -0.098 dB for glaucomatous eyes. Russell 
et al. (2012) [25] showed that the distribution of residuals is relatively concentrated at 
high VF sensitivities (26 to 36 dB) but stretches substantially as the sensitivity estimates 
decrease to a level of 10 dB. Sensitivity estimates near 10 dB are associated with residuals 
spanning almost the entire dynamic range of the instrument. This could be caused by a 
loss of ganglion cells (due to glaucomatous damage), or relocation of the stimulus to the 
peripheral visual field where there are fewer ganglion cells [27]. Zhu et al. (2014) [30] de¬ 
scribe a method to detect change using an inferential statistical model which incorporates 
the non-stationary variability using a mixture of Weibull distributions. 

Although there is a wide range of literature which discusses these aspects, the majority 
of previous work deals with the global indices or treats each point-wise estimate as an 
independent sample. Furthermore, these aspects have been addressed separately. Hence, 
it is clear that an approach which takes into account the complex structure of the data 
and considers all of the aforementioned problems, is needed. We will address censoring, 
the hierarchical structure, the global variation as well as the relationship between the 
variability and sensitivity. 

3 Statistical Models 

Modeling the sensitivity estimates is beneficial for the evaluation of the progression of VF 
loss. By incorporating biological effects into the model, we aimed to improve the model 
fit and hence provide a better method for modeling this progression. This was done by 
building the model up sequentially. 
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3.1 Censoring 

It is important to note that unseen sensitivity estimates are indicated on the VF print out 
as < 0. They are smaller than zero because the instrument is unable to determine such 
sensitivities. Thus a model which defines the relationship between time and the latent, 
true sensitivity value is needed. The relationship between the observed y* and the latent, 
true sensitivity value y is given by, 


y* = y X I(y >0) + 0 X I(y < 0). 


3.2 Hierarchical Model 

We propose using a Bayesian hierarchical mixed effects model [17, 21] to analyze the 
glaucoma data. This model is able to take into account both the within subject and 
between subject variability. Furthermore, we capitalized on the common features within 
each eye by taking into account the correlation between measurements belonging to the 
same eye. In addition, correlation of VF measurements within the inferior and superior 
hemifields, separated by the horizontal raphe, was assumed to be higher than between 
hemifields. Hence, the hierarchical structure of the data consists of 4 levels, namely, (1) 
the individual, (2) the eye (3) the hemifield and (4) the location. Let fi correspond to the 
regression parameters and years i4 represent the time between measurement t and the first 
measurement for each individual i, ranging from 0 to 10.5 years. The individual-specific 
intercept and slope are represented by a, the eye-specific intercept and slope by 7 , the 
hemifield-specific intercept and slope by 77 , and the location-specific intercept and slope 
by A. We then have, for individual i = 1,..., V; eye e = 1, 2 ; hemifield h = 1, 2 ; location 
l = 1,... ,26 and timepoint t = 1 ,...,Xj, 


Model 1: 
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3.3 Visit Effect 

Junoy Montolio et al. (2012) [13] explicitly modeled the global variations with known 
factors such as season, time of day and reliability indices. However, we speculated that 
other transient factors, such as fatigue, lack of concentration, or delayed reaction time 
may play a more important role. Since all these (as well as possibly other) factors, affect 
all locations belonging to the same VF, we propose to take them together and to call them, 
as well as model them as the Global Visit Effects (GVEs). In this way, we can account 
for both the known and the unknown factors. Hence, the GVE accounts for all factors 
that affect all measurements of the same eye at each visit. To illustrate the importance 
of these factors, we show in Figure 2 the VFs over time of one eye, where all locations 
have a drastic decrease in sensitivity at around 1 year. From the longitudinal profiles, it 
is evident that this decrease is caused by something that affected all VF measurements 
of that visit, rather than by actual damage. To account for the visit-dependent offset at 
all locations, or GVE, we included a parameter, (j)i e t, in the model to capture the offset 
at every visit j for each eye k within each individual i. This gives, 

Model 2: 


Uiehlt — k'iehlt ^iet T (-iehlt 

~ ^ iehlt + e iehlt■ (2) 

From an initial exploratory analysis, we observed a number of spikes in the distribution 
of the visit effects. To accommodate these spikes, we assumed a t-distribution for 
The t-distribution allows greater flexibility in the distribution of random effects compared 
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Figure 2: Retinal sensitivity estimates over time for each location of the visual held in the left 
eye of a single glaucoma patient. A decrease in the sensitivity estimates can be seen in all 
locations at around 1 year. The longitudinal profile of the MD values over time are shown on 
the right. The visit-dependent decrease is also clear at around 1 year for the MD. 


to the normal distribution, and can handle heavy tails in random effects distributions [16]. 
Hence, we let, 


<Aet ~ t(0, <4,3), 

where t(/r, <x 2 , df) denotes the generalized t-distribution with mean /j, scale parameter a, 
and df degrees of freedom. 

3.4 Relationship between Variability and Sensitivity 

There is an association between a decline in VF sensitivity and an increase in response 
variability. However, values lower than 0 dB cannot be measured. This inherent censoring 
process introduces a positive bias at low sensitivity estimates, which is made worse by 
the increased variability for low sensitivity estimates. We assumed a linear relationship 
between the expected values of the sensitivity estimates and the logarithm of the standard 
deviation. However, since we were interested in modeling the latent sensitivity estimates, 
we extrapolated this linear relationship for predicted sensitivity estimates below 10 dB. 
This can be seen in Figure 3. In this exploratory analysis, we found that the relationship 
was well represented by the function log(S'D) = A + B x sensitivity(AB), where A 
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and B are 2.60 dB and -0.06 dB respectively. We extended Model 2 to incorporate this 
relationship such that, 

Model 3: 


Uiehlt 


..(2) I fZ 

Uiehlt ' '-iehlt i 


and 


log (cTiehlt) = f{E{y ie hlt)} 

= K + Ktiiu 


(3) 


where / is a linear function. A summary of all the parameters and their definitions 
for all the models is shown in Table 1. 



Predicted sensitivity estimate (dB) 


Figure 3: Bubbleplot representing the mean logarithm of the standard deviation for different 
predicted sensitivity estimates determined using linear regression for each location. The pre¬ 
dicted values were subdivided into groups with width 5 dB. The empty bubbles correspond to 
the hypothetical values, corresponding to the censored measurements, for the mean logarithm of 
the standard deviation for the predicted sensitivity estimates after extrapolation. The bubbles 
are scaled to the logarithm of the number of observations. 
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Tabic 1: Summary of parameters included cumulatively in each of the models 


Model 

Parameter 

Definition 

Model 1 

Viehlt 

Latent sensitivity estimate 


Po 

Population-averaged intercept 


Pi 

Population-averaged slope 


»0i 

Individual-specific intercept 


Oi\i 

Individual-specific slope 


7o i 

Eye-specific intercept 


Hi 

Eye-specific slope 


Vo i 

Hemifield-specific intercept 


Vli 

Hemifield-specific slope 


Ao i 

Location-specific intercept 


Ai i 

Location-specific slope 


a 2 

Variance 

Model 2 

4*iet 

Global visit effect 



Global Visit Effect variance 

Model 3 

Po 

Intercept in logarithm of the standard deviation 


pi 

Slope in logarithm of the standard deviation 


4 Estimation Approach 

4.1 One-stage approach 

The Bayesian approach takes into account the uncertainty in all model parameters and 
allows for prior information to be incorporated. Furthermore, MCMC algorithms allow 
greater flexibility by relaxing the strong parametric assumptions commonly used in most 
frequentist hierarchical models [17, 18]. The classical Bayesian approach is one-stage hi¬ 
erarchical modeling, which has the advantage that subject-specific and overall parameters 
are estimated simultaneously. However, for a (relatively) large data set, this approach can 
be difficult or even impossible to implement for complex models with standard MCMC 
software. In our case, we had a total of 45,005 parameters which needed to be esti¬ 
mated. As a consequence, we were unable to achieve convergence in a realistic time frame 
and experienced computer memory limitations when using WinBUGS or JAGS. For such 
situations, a computationally more efficient method is needed. 

4.2 Two-stage approach 

Lunn et al. [18] proposed two-stage Bayesian hierarchical modeling. The glaucoma data 
also exhibit an hierarchical structure, but of a more complex nature. Figure 4 illustrates 
the hierarchical structure of the glaucoma data, as well as the cross-classified random 


11 





effects, divided into two stages. The two-stage approach allowed us to simplify the problem 
by splitting hierarchical models with M levels at level m*. Independent parameters of 
interest at level m* are obtained in stage 1 and used as proposal distributions for those 
parameters in stage 2. Lunn et al. illustrated this method using models with two and 
three levels. We applied this to a more complex model with four levels. In our case, we 
split the levels at the individual level, treating each individual as their own sample. These 
individuals were then analyzed independently before combining them to obtain population 
level estimates. 



Figure 4: Illustration of the hierarchical structure of the data divided into the first and second 
stages as done in the two-stage approach. 


4.2.1 First stage 

In the first stage, we analyzed each individual separately. Without loss of generality, we 
only show this for Model 3. This becomes: 
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One important detail about the two-stage approach is that it allows the individual 
variances to differ, i.e. X 7 j, X^j, X^ and but also v ie hit, since the regression coefficients 
are now allowed to depend on the subject. This is in contrast to the one-stage model 
which requires the variances to be the same, i.e. X 7 , X^, X^ and Hence, the two- 
stage approach is more flexible, as it can account for these differences if they are present 
in the data. In the Bayesian procedure prior distributions need to be assumed for all 
parameters. Explicit expressions for the priors are given in the Appendix. In order to 
prevent the second-stage sampler from becoming stuck near local posterior modes, large 
independent samples are needed from this first stage [18]. To achieve this, we ran 200,000 
iterations with a burn-in of 150,000 and thinning of 10, resulting in 138 samples of 5,000 
iterations each for each parameter. 

4.2.2 Second stage 

In the first stage, a, and (3* were treated as fixed effects. These parameters need to 
be combined in the second stage to obtain the population-averaged effects, /3 and /3*. 
We denote them as 6i. In the case of a meta analysis, the random effects in the first 
stage, which we denote as Lj, may not be of direct interest. Hence, these terms can 
be treated as nuisance parameters as done by Lunn et al. [18]. However, for clinical 
applications such as ours, these may be important. In order to avoid further computational 
problems, we took only the covariance matrices of the random effects in the first stage 
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to the second stage, which allowed us to re-estimate the random effects in an additional 
step. Since each of the elements in the matrices were treated as separate parameters 
in the second stage, Cholesky decomposition of the covariance matrices of the random 
effects, S 7i , and respectively, was used. More specifically, for each of the above 
covariance matrices a full rank lower triangular matrix U with real and positive diagonal 
entries was generated, ensuring that UU T is positive definite. We denote the Cholesky 
decomposition factors for all of the covariance matrices by C t . Hence, we let {9i,Ci} 
represent parameters of interest from the first-stage. The samples of these parameters 
were then used as proposal distributions within a Metropolis-Hastings step in the second- 
stage to obtain {9, C} = {#;, C\, i = 1,..., N}. Three chains were initialized with different 
starting values for all models determined from the first-stage samples. This was done 
using the minimum value, the mean value and the maximum value for each parameter for 
every individual. Upon convergence, we computed the posterior mean, median, standard 
deviation with the equal tail 95% credible interval (Cl) for all parameters of interest. 

5 Model Evaluation 

Standard approaches are applicable to the results from the first stage since this stage 
represents a standard analysis. Hence, we evaluated the models at this stage for each 
individual separately using posterior predictive checks (PPC), such as the y 2 -test statistic. 
A further comparison of the models was done after the second stage using the Deviance 
Information Criterion (DIC) to determine the overall best model. 

5.1 Posterior predictive check 

We denote all parameters for individual i from the first stage, i.e. {0*, Cj, L*}, as Vu Let 
ipj,... be the converged Markov chain from pfyi \ Hi). Furthermore the vector of 
all responses for the ith individual is denoted by r/j. The posterior predictive P-value 
(PPP) for a discrepancy measure, D(yi \ ift-) is then calculated and replicated data yf is 
sampled fromp(r/j | ). D{y^ : ^) can then be computed for k in {1,..., K}, and po can 

be estimated by, 
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( 5 ) 


k =1 

Here, the Gelman x 2 -test statistic [9] was used as the discrepancy measure to calculate 
the PPC for each individual. This is defined as: 


H 


d(i= 


k M2 


e=l h =1 Z=1 t= 1 


['Uiehlt E(Uiehlt I )] 

var {y ieh it \ ip!f) 


( 6 ) 


(2) 

where E(yi e hit) is dehned as p y iehlt - A small value indicates a bad model fit. The above 
predictive P-values can then be contrasted against a uniform distribution to evaluate 
globally model fit for each individual. In general, if po is smaller than 0.05 or larger than 
0.95, then this is an indication that the model might not fit the data well. Note that this 
procedure is, however, conservative because the data is used twice: one for model fit and 
one for model evaluation, see e.g. [17]. 


5.2 Deviance Information Criterion 

In a Bayesian framework, a common tool for model evaluation is the Deviance Information 
Criterion (DIC) proposed by Spiegelhalter et al. (2002) [26]. The DIC is defined as: 


DIC = D({9, C}, L) + 2 Pd = D({0, C}, L) + p Dl (7) 


where 


p D = D({d,C}L)-D({0,C},L). 

In the definition of the DIC, we have the fixed effects parameters as well as the random 
effects which are treated as nuisance parameters in the two-stage approach. One of the 
disadvantages of the two-stage approach is not being able to directly obtain the random 
effects estimates. Since most model evaluation methods, such as the Deviance Information 
Criteria (DIC), require the random effect estimates for the computation, it is not clear 
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how to evaluate models using this approach. To overcome this limitation, we propose 
an extension of the two-stage approach, by including an additional step based on the 
Method of Composition in combination with a Metropolis-within-Gibbs technique. More 
specifically, for the calculation of the DIC we are required to obtain a sample of the 
random effects from their posterior distribution p(Li \ yi), which is written as: 


p{Li | yi) = J p{Li | yi,ni)p(Qi \ (8) 

where f2* = (/3o, Pi, Po, Pt, « 0 o «ii, S 7 , £,,, up is the vector of all parameters of 
main interest; frepresents the sampled values from the second stage. Identity (8) sug¬ 
gests that we can use the following simulation scheme. This is a sampling algorithm based 
on the Method of Composition in combination with a Metropolis-within-Gibbs technique, 
which is applied to each individual. For the ith individual the computations are done as 
follows: 

Step 1: For iteration k, let the parameters estimated in the second stage of the two- 
stage approach be denoted by . 

Step 2: Given , we sample: 


TOze? Tlie? VOiehi Vlieh-) ^Oiehli ^liehl Q'lld 4>ieti 
(k) 

which we denote as L\ . This is done using the Metropolis-within-Gibbs technique. 
Thus, for each sequence of generated parameters from the second stage, we apply MCMC 
sampling to obtain these estimates: 

Step 2A: Initial values are determined using an optimization routine. 

Step 2B: A random walk Metropolis Hastings algorithm is used for each of the levels. 
This is done iteratively, to take into account the correlation between the levels. 

(k) 

Step 3: After an inital burn-in, we save the last estimate for each parameter in L\ . 
Step 4 : This is repeated for K iterations, resulting in: 


(1) r(2) 

i > i 


t {K) 


Hence, in combination with the results from the second stage, we have now obtained 
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Ordered PPP-values 


all the parameter estimates which are needed to compute the DIG. 


6 Application to the Glaucoma study 

For this analysis we included both eyes from the 139 individuals belonging to the Glaucoma 
study. After excluding VFs with unknown reliability as indicated by the instrument, 138 
individuals and 276 eyes remained. This included 4,758 VFs, resulting in a data set 
consisting of 14,352 VFs and 247,520 location-specific sensitivity estimates. All analyses 
were done taking into account censoring, and hence using the latent sensitivity values, 

Viehlt- 


6.1 Results 

The two-stage approach is advantageous, as it allows us to do exploratory analyses at the 
individual level in order to simplify the model before combining the samples in the second 
stage. In order to compare the models, we can evaluate the outcome of the PPC using 
graphical output. The PPP-values denoted by pr> were computed for every individual. 
Figure 5 shows the ordered PPP-values for each of the models. Model 1 has a mean 
pD = 0.30, Model 2 a mean p£> = 0.30 and Model 3 a mean po = 0.50. From this, it 
appears that Model 3 has the best fit. This approach gives a good indication of whether 
the models fit the data, specifically for each individual. 





0.0 0.2 0.4 0.6 0.8 1.0 


Ordinates of the Uniform quantiles 


Ordinates of the Uniform quantiles 


Ordinates of the Uniform quantiles 


Figure 5: Posterior predictive check for each of the models across all individuals 


An example of the model fits for I location is shown in Figure 6. The posterior 
summary statistics from the second stage are listed in Table 2 for each of the models. A 
difference in DIC of more than 10 indicates that the model with the lowest DIC has a 
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better fit. Using the DIC to compare the models, Model 2 (DIC=—9.045e +O7 ) performed 
better than Model 1 (DIC = —8.827e +07 ), with Model 3 (DIC = — 1.792e +26 ) performing 
the best overall. Using the results from Model 3, the population intercept (po) was 19.82 
dB with an average slope (/%) of -0.31 dB per year. The intercept (/3 q) and slope (PI) 
for the logarithm of the standard deviation was 2.82 dB and -0.08 dB respectively. This 
corresponds to the 2.60 dB and -0.06 dB which was found in the exploratory analysis 
shown in Figure 3. 



Time (years) 

Figure 6: Scatter plot representing the retinal sensitivity estimates over time for 1 location of 
the VF. The lines represent the model fits for each of the 3 models 


Table 2: Posterior summary statistics for the three models using the two-stage approach 


Model 

1 




2 




3 




Parameter 

mean 

sd 

95% Cl 

mean 

sd 

95% Cl 

mean 

sd 

95% Cl 

Po 

18.95 

0.72 

(17.53 : 

20.36) 

20.42 

0.73 

(18.95 

; 21.84) 

19.89 

0.77 

(18.36 : 

21.37) 

Pi 

- 0.22 

0.0 

(-0.33 : 

-0.13) 

- 0.21 

0.05 

(-0.31 

; -o.ii) 

-0.31 

0.05 

(-0.41 

; 0 . 20 ) 

Po 









2.82 

0.06 

(2.70 : 

2.95) 

Pt 









-0.08 

0.08 

(-0.08 ; 

-0.07) 

<x 2 

13.42 

0.66 

(12.17 : 

14.75) 

11.51 

0.56 

(10.45 

; 12 . 68 ) 





a \ 





0.62 

0.05 

(0.52 

; 0.73) 

1.87 

0.04 

(1.81 : 

1.96) 

DIC 



—8.827e +U7 



—9.045e +U7 



— 1.792e +2b 
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6.2 Clinical Implications 

With the GVE, we account for those factors, as well as those which can not be measured 
such as fatigue and delayed reaction time. Including the GVE showed a significant im¬ 
provement in the model fit. Hence, by taking into account the GVE we were able to take 
into account a large part of the variability and obtain better estimates of the true rate 
of progression. By including the relationship between variability and sensitivity shows a 
further improvement in the model fit. The function which describes this relationship was 
consistent to that found by Henson et al [10], however it was not shown previously how 
to include this relationship in a model, or whether including it would improve the model 
fit. By including both of these aspects, we were able to improve the estimation of the 
true underlying progression and determine the real evolution of the sensitivity over time. 
Hence, these improved estimates could aid clinicians in optimizing treatment strategies. 

7 Discussion 

In this paper, we proposed a method to model point-wise VFs taking into account the 
complexity of psychophysical testing of visual function in glaucoma. The model is advan¬ 
tageous in dealing with the high measurement variability, and could be extended for the 
prediction of future VFs. Although it was possible to use the one-stage approach with 
simplied versions of the model or with smaller datasets, it was not possible to perform 
these analysis on the full data with a complex model as it was with the two-stage ap¬ 
proach. The two-stage approach can be implemented in standard MCMC software. The 
relevant computations for the first-stage can be carried out in JAGS [23], WinBUGS or 
OpenBUGS [19] software and the second-stage using OpenBUGS software [18]. However, 
for the second stage an add-on program is needed. For more details on setting up Open¬ 
BUGS for performing the two-stage analyses, we refer to [18]. More information regarding 
the computations done in this paper can be obtained by emailing the first author. These 
computations can be easily tuned to adapt to other data sets by any practitioner. 

The two-stage method is advantageous as it allows us to do exploratory analysis at an 
individual level. Hence, we are able to simplify and improve the model before combining it 
at a population level. Limited simulations showed that the one- and two-stage approaches 
gave similar results if the variances were the same for all individuals. The two-stage 
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approach assumes a more flexible method. However, there is the additional difficulty in 
constraining the parameters across individuals. One disadvantage of this approach is that 
it does not provide the required components to evaluate the fit and predictive ability of 
the model using the Deviance Information Criterion (DIC). In order to calculate the DIC 
and compare different competing models for our data fitted using the two-stage approach, 
we suggested a Monte Carlo scheme based on a Metropolis-within-Gibbs algorithm. 

Other issues, which we see as future research directions, is to look at the optimal choice 
of the level where the data should be split. Extensions include exploiting the spatial nature 
of the data and capitalizing on the specific spatial organization of the nerve fibres in the 
eye [7]. 
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A Appendix 


A.l One Stage Approach 

A. 1.1 Full Model 


Uiehlt = f 3 0 + / 3 iyears^ + a 0i + amyears^ + 70 ie + 7i ie years it + 


VOieh + Vliehy ea - TS it + A) iehl + ^liehlY^^ij + <foet + t-iehlt (A.l) 


where e iehU ~ N( 0, a 2 ehlt ) and log(a iehU ) = + ^* ehlt - 


A.1.2 Priors 

In the Bayesian procedure prior distributions need to be stipulated for all parameters. 
When no prior information is available then the prior distribution should reflect this. In 
this case a vague prior is a natural choice. 


Pb 

rsj 

N( 0,10 s ) for 6 = 0,1; 


P* q 


IV(0,10 8 ) for q = 0,1 


Oii 

= 

(2?:)~AT((8),E a = 

( Sail Sq,12 \ \. 

\ S a 21 S a 22 J b 

Tie 

= 

® ~A((0),E 7 = 

/ S 7 n S^i2 
' S 7 21 S 7 22 ' '' 

Vieh 

= 

(S)~AT(( §),£,= 

( 11 S ? ,i2 \ \ _ 

' S?j21 S,,22 

^iehl 

= 

te“)~"«8 ). s A 

= (^11 SU2)) 

v ^A21 ^A22 ' ' 

4*iet 

r^i 

t(0,a^,3) and 


€iehlt 

r-sj 

N(0,a 2 ). 



The variance was given a vague inverse gamma prior. The covariance matrices of the 
random effects, i.e. S 7 , T, v and Y,\, were given a vague inverse Wishart distribution with 
degrees of freedom equal to the number of dimensions and small diagonal values for the 
scale matrix. 
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A.2 Two Stage Approach 

A.2.1 Full Model 


Vieh.lt = A) A /?iyears it + a 0 i A ai- t years it + 7 0ie + 7 i ie years^ A 

VOieh I" VliehY ea - rs it A -^0 iehl A ^liehlY^^-^^it “I” &iet A €iehlt (A.2) 

where e ie «t ~ N(0, <j 2 iehlt ) and log((j iehU ) = A A>* eWr 

A.2.2 Priors 


4>iet ~ 

O-gi ~ 


lie = 


Vieh — 


iehl — 


t( 0 , <rji, 3 ); <rj. ~ A(cr^E 0 ) 

iV(0,10 s ) for 5 = 0,1; 


( 


TOie 

Tlie 


~A((°),E 7i = 


/ ^7lli ^712 i \\. 
V £ 7 2H S 7 22i 


/ VOieh 
\ ?7lieh. 


~jv((°),e* = 


/ ^r 7 l 2 i \\ 

V £^211 ^r)22i 


^0 iehl 
•^1 iehl 


^(( 8 ),s Ai = ( 


^Alli S A 12 i 
SA21i Sa22i 


)) and 


e ieMt ~ A(0, <7iehlt)- 


Using Cholesky decompostion, S 7i , E, ?i and E^ become, 

C rii ~ N(C n , 10 s ) for r = 1, 2, 3 
C rr)i ~ N(C rr ), 10 s ) for r = 1, 2, 3 
C rXi ~N{C rX , 10 s ) for r = 1,2,3 


A.2.3 First Stage Model 


Viehlt = otQi A aijyears^ + 7 0ie A 7 i ie years it + 77 0ieh + ??n e hyears it + 

■^0 iehl A -'hie/iiysarSjj A (foiet A tiehlt (A.3) 

where e iem ~ JV(0, a 2 ehlt ) and log(a iehU ) = /% A PuH* ehlt - 
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A.2.4 First Stage Priors 


~ N(0, 10 8 ) for q = 0,1 
a g i ~ N( 0,10 8 ) for g = 0,1 
~ t(0,0-^,3) 

7*e = (70ie;7lie) ~-/V 2 (( 0 , 0 ) , X 7i ) 

7ieh (VOielii Vlieh) ~ -^2 ((0,0) , 

\ehl = (^Oiehh ^liehl) T ~ A 2 ((0, 0) T , S A .). 

The variance was given a vague inverse gamma prior. The covariance matrices of the 
random effects, i.e. X 7i , S T); . and £ Ai , were given a vague inverse Wishart distribution 
with degrees of freedom equal to the number of dimensions and small diagonal values for 
the scale matrix. 

A.2.5 Second Stage Priors 


/?*• 

~ A(/3*,10 8 ) 

for q = 0,1 

Oil 

= (aoi,«ij) T 

~ N 2 ((Po,Pi) T 




C r ^n 

~ N(C rj , 10 8 

) for r = 1, 2,3 

Crrji 

~ N(C rv , 10 8 

) for r = 1, 2,3 

CrXi 

~ A(a A ,io 8 

) for r = 1, 2,3 
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B Methods 


Let, 

P* = {{P^Pu), fori = l,...,iV} 

a = {(a 0 i,au), for i = 1,..., N} 

C 1 = {( C 1 7 ., C 27i , C 37i ), for i = 1,..., IV} 

G, = {(Gr;.,, Gr;,: ■••• C'.'ir),), for j = 1, . . . , iV} 

G = {(C'iA i ,C , 2 A i ,---,C , 3 A i ), for * = 1,...,N} 

Then, from Lunn et al. (2013), 6 = parameters of interest: a, P*, C 7 , C^, Ca 
A = nuisance parameters: 4>, 7 , r/, A 
H = mean of 0 : p^\ P^ 2 \ C 7 , C, ? , Ca 
S = covariance matrix of 0 : X^i), S^( 2 ), Sc^, £<?„, Sc A 

B.l Full Model 

The joint posterior distribution is given by, 




^/ 3 ( 1 ) 5 ^/ 3 ( 2 ) 5 5 ^‘Crj 5 i®') ft 1 ^75 ^775 0 ? Tj Vi ^|y) 


ocp(/3 (1) )p(^ (2) )p(C 7 )p(C, ? )p(Ca)p(C 7 )p(C, ? )p(Ca)p(S /3(1) )p(S /3(2) )p(S C7 )p(S c?? )p(S c Jx 

N 

n {p(:%kb : P*, C^C^CxiAi, 7 i, Vi, \i)p(ai\P {1) , £0(1)) p(A*|/3 (2) , £0(2)) X 

7=1 

p(C 7 i |C 7 , ScJMCaJCa, T, Cx )p(<l>,'y,r), A)} (B.l) 


B.2 First Stage 

We analyse all individuals independently from the joint posterior distribution of each 
a t , P*, C 7i , C Vi , C\ t conditional on y t alone, 

p{ a i, Pi, C li7 C Vi , C\ v 4>i, 7i, Vi, \\yi) oc p{yi\a i ,P*,C li ,C r]i ,Cx i ,^i,'yi,Vi,h) x 

p(a u P*,C ri ,C ni ,C Xi )p(<Pi, 7 i, Vi, A») (B.2) 
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B.3 Second Stage 

From the distributions in (2.1) these are given by, 


p(^\ C 7 , C v , C x \^ m ,E m , E c „ S Ca , a, /?*, C 7 , C, ? , C A , ft 7,7, A, y) 

N 

oc p(^\f5^,c^ C v , C x ) ft*, C 7i , Cft, £7 Aj |/ 3 (1) ,/3 (2) , C 7 , (ft, Ca, 


2=1 


S/3(l), ^( 2 ), Sc 7 , Sc,, £ Ca ) 


(B.3) 


^ 1 ), S^pj, Sc 7 , E c „ Ec A |/? (1) , /3 (2) , (ft, (ft, a, ft*C 7 , (ft, (ft, ft 7, V, A, y) 

N 

oc p(£/3 (1) , S^ (2) , Sc 7 , Sc,, Sc A ) n p(ai, ft*, C 7i , C Vi , Cft l/S^, (3^ , (ft, (ft, (ft, 

2=1 

S /3 (i),S^( 2 ),Sc 7 ,Sc„Sc A ) (B.4) 


p(«i, ft*, Cft, (ft, Cft, ft, 7 i, Pi , Aj |/? (1) , /3 (2) , (ft, (ft, Cx , S^), S^, Sc 7 , S c ,, S Ca , y) 
(xp{yi\a.i, ft*, Cft, (ft, Cft, ft, 7i, , A* , ft, 7;, %, K)p(&i , ft*, Cft, Cft, C' Ai |/3 (1) , /3 (2) , 
Cft (ft, (ft, , S^ 2 )> Ec 7 , Sc,, Sc A ).p(ft, 7*> Vii Aj) 

i = 1,..., IV. (B.5) 


The distributions from (B.3) and (B.4) are available in closed form and can hence we can 
sample from them directly by using standard algorithms. For the distributions (B.5) we 
use the distributions in (B.2) as the proposal distributions within a Metropolis-Hastings 
step. For this, the target-to-proposal ratio can be simplified to, 



C{oii, , C 7i , Cr*, C\ii <t>i, 7 it Vi, Aj) 


p{yi I otj , p* ,c 7i ,c, ?i , c Xi Aha,m A:) ^ x 
p(yi\ai, PiiCliiCruiCx^fa, 7j, rii, A; ; 

p(«i, Pi-.C'KiCmiC\i\j3 {l \^ 2 \ C 7 , C„, C a , £/3 (i), £^(2), Sc 7 , Sc,,, £c a MA 7 i,Vi,h) 


P(&i, ^i,C^C m ,Cxi)p{(l>i, 7 i, Vi, Aj) 
Ka.^,C 7i ,C )?i ,C At l^),/3( 2 ),C 7 ,C,,C Ai S /3(l) , S /3(2) i S C 7 , £<7,, £c A ) 

p(ai, ( 3 * , C 7i , , C Ai ) 


(B.6) 
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